Goto

Collaborating Authors

 langevin algorithm


From stability of Langevin diffusion to convergence of proximal MCMC for non-log-concave sampling

Neural Information Processing Systems

We consider the problem of sampling distributions stemming from non-convex potentials with Unadjusted Langevin Algorithm (ULA). We prove the stability of the discrete-time ULA to drift approximations under the assumption that the potential is strongly convex at infinity.


Learning Latent Variable Models via Jarzynski-adjusted Langevin Algorithm

Neural Information Processing Systems

We utilise a sampler originating from nonequilibrium statistical mechanics, termed here Jarzynski-adjusted Langevin algorithm (JALA), to build statistical estimation methods in latent variable models. We achieve this by leveraging Jarzynski's equality and developing algorithms based on a weighted version of the unadjusted Langevin algorithm (ULA) with recursively updated weights. Adapting this for latent variable models, we develop a sequential Monte Carlo (SMC) method that provides the maximum marginal likelihood estimate of the parameters, termed JALA-EM. Under suitable regularity assumptions on the marginal likelihood, we provide a nonasymptotic analysis of the JALA-EM scheme implemented with stochastic gradient descent and show that it provably converges to the maximum marginal likelihood estimate. We demonstrate the performance of JALA-EM on a variety of latent variable models and show that it performs comparably to existing methods in terms of accuracy and computational efficiency. Importantly, the ability to recursively estimate marginal likelihoods--an uncommon feature among scalable methods--makes our approach particularly suited for model selection, which we validate through dedicated experiments.


From stability of Langevin diffusion to convergence of proximal MCMC for non-log-concave sampling

Neural Information Processing Systems

We consider the problem of sampling distributions stemming from non-convex potentials with Unadjusted Langevin Algorithm (ULA). We prove the stability of the discrete-time ULA to drift approximations under the assumption that the potential is strongly convex at infinity.


Learning Latent Variable Models via Jarzynski-adjusted Langevin Algorithm

Neural Information Processing Systems

We utilise a sampler originating from nonequilibrium statistical mechanics, termed here Jarzynski-adjusted Langevin algorithm (JALA), to build statistical estimation methods in latent variable models. We achieve this by leveraging Jarzynski's equality and developing algorithms based on a weighted version of the unadjusted Langevin algorithm (ULA) with recursively updated weights. Adapting this for latent variable models, we develop a sequential Monte Carlo (SMC) method that provides the maximum marginal likelihood estimate of the parameters, termed JALA-EM. Under suitable regularity assumptions on the marginal likelihood, we provide a nonasymptotic analysis of the JALA-EM scheme implemented with stochastic gradient descent and show that it provably converges to the maximum marginal likelihood estimate. We demonstrate the performance of JALA-EM on a variety of latent variable models and show that it performs comparably to existing methods in terms of accuracy and computational efficiency. Importantly, the ability to recursively estimate marginal likelihoods--an uncommon feature among scalable methods--makes our approach particularly suited for model selection, which we validate through dedicated experiments.


Theoretical guidelines for annealed Langevin dynamics in compositional simulation-based inference

arXiv.org Machine Learning

Compositional score-based approaches to simulation-based inference (SBI) approximate the posterior over a shared parameter given $n$ independent observations by aggregating individually learned posterior scores: currently, there are two main propositions of such methods (Geffner et al. (2023), Linhart et al. (2026)). As the resulting composite score does not correspond to the score of any distribution along the forward diffusion path of the true multi-observation posterior, sampling from it via a reverse SDE leads to an irreducible bias. Annealed Langevin dynamics provides a principled alternative: it treats the composite score as the genuine score of a sequence of tractable bridging densities and samples from them in succession. When properly tuned, it could lead to a controllable bias. However, its hyperparameters, namely step sizes, the number of steps per level, and the number of annealing levels, have so far been chosen empirically. We derive Wasserstein bounds for annealed Langevin with approximate scores and translate them into explicit decision rules for these hyperparameters that guarantee a prescribed sampling accuracy, while highlighting different theoretical aspects of each composite score formulation. In the Gaussian setting, we obtain closed-form expressions for all relevant quantities and prove that the bridging densities of Linhart et al. (2026) consistently admit larger step sizes and require fewer total Langevin steps than those of Geffner et al. (2023). Furthermore, we show empirically that the tuning obtained in the Gaussian setting generalizes to more complex problems, thus providing a well-understood and theoretically grounded starting point for practitioners using compositional score-based approaches.


Decentralized Proximal Stochastic Gradient Langevin Dynamics

arXiv.org Machine Learning

Decentralized learning is a learning process in which data is distributed across computational agents or collected by individual agents, and model parameters are computed as the consensus of the agents. It has gained a lot of interest for applications where agents can collaboratively learn a predictive model without sharing their own data, but sharing only their local models with their immediate neighbors to generate a global model [He et al., 2018, Hendrikx et al., 2019, Arjevani et al., 2020]. We assume there are N agents who are connected over an undirected communication network G = (V,E) where V = {1,...,N} represents the agents and E V V denotes the set of edges; i.e., if agent i and j are connected then (i,j) E implies (j,i) E. Suppose we have a collection of n independent and identically distributed (i.i.d.) data pairs zi = (ai,yi), where ai Rp is the feature vector and yi the label or response of the i-th observation. Let Z = [z1,z2,,zn] Rnp be sampled from the distribution p(Z|x) where the parameter x Rd has a common prior. The goal is to sample from the posterior distribution p(x|Z) p(Z|x)p(x) by distributing Z among N agents such that Zi = {zi1,zi2,,zini} is the subset of data exclusive to agent i.


Double Randomized Underdamped Langevin with Dimension-Independent Convergence Guarantee

Neural Information Processing Systems

This paper focuses on the high-dimensional sampling of log-concave distributions with composite structures: p (dx) exp( g(x) f(x))dx. We develop a double randomization technique, which leads to a fast underdamped Langevin algorithm with a dimension-independent convergence guarantee.



Double Randomized Underdamped Langevin with Dimension-Independent Convergence Guarantee Y uanshi Liu, Cong Fang, Tong Zhang School of Intelligence Science and Technology, Peking University

Neural Information Processing Systems

Sampling from a high-dimensional distribution serves as one of the key components in statistics, machine learning, and scientific computing, and constitutes the foundation of the fields including Bayesian statistics and generative models [Liu and Liu, 2001, Brooks et al., 2011, Song et al.,


Double Randomized Underdamped Langevin with Dimension-Independent Convergence Guarantee Y uanshi Liu, Cong Fang, Tong Zhang School of Intelligence Science and Technology, Peking University

Neural Information Processing Systems

Sampling from a high-dimensional distribution serves as one of the key components in statistics, machine learning, and scientific computing, and constitutes the foundation of the fields including Bayesian statistics and generative models [Liu and Liu, 2001, Brooks et al., 2011, Song et al.,